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ABSTRACT 

We present a new, simple method to predict activity-induced radial velocity varia- 
tions using high-precision time-series photometry. It is based on insights from a sim- 
ple spot model, has only two free parameters (one of which can be estimated from 
the light curve) and does not require knowledge of the stellar rotation period. We 
test the method on simulated data and illustrate its performance by applying it to 
MOST/SOPHIE observations of the planet host-star HD 189733, where it gives almost 
identical results to much more sophisticated, but highly degenerate models, and syn- 
thetic data for the Sun, where we demonstrate that it can reproduce variations well 
below the ms"'^ level. We also apply it to Quarter 1 data for Kepler transit candidate 
host stars, where it can be used to estimate RV variations down to the 2-3 ms~^ level, 
and show that RV amplitudes above that level may be expected for approximately 
two thirds of the candidates we examined. 

Key words: methods: data analysis - techniques: photometric - techniques: radial 
velocities - Sun: activity - stars: individual: HD 189733 - planetary systems 



variations. Again, there is clearly a relationship between the 
two - in particular, the lower envelope of the RV scatter 
increases for increasing levels of activity - but there is also 
a wide range of RV scatters for a given spectral type and 
activity level. Furthermore, the 'jitter' formalism is limited, 
because it treats the activity signal as an independent, iden- 
tically distributed Gaussian noise process. Activity-induced 
RV signals arise from the rotational modulation and intrin- 
sic evolution of magnetized regions, and are thus naturally 
correlated in time, often quasi periodic, and non-stationary. 
Therefore the impact of variability-induced RV signals on 
planet detection will generally be much more severe than 
that of a random jitter of the same mean amplitude. 

For individual stars, somewhat more sophisticated ap- 
proaches are in common use, mostly making use of chromo- 
spheric activity indicators, such as excess flux in the cores of 
the Ha and Ca ll H & K lines, of measurements of the degree 
of asymmetry of the spectral lines, such as the bisector span 
of the cross-correlation function (CCF) between the stellar 
spectrum and a template, which is often used to derive the 
RV itself. The most obvious step is to check for periodic 
modulation in these indicators, which can reveal that a sus- 
pected planetary signal is in fact due to activity (see e.g. 
IQueloz et al.|[20M] [Bonflls et al.||2007| ). The correlation be- 
tween RVs and bisector span can also be used to correct for 



1 INTRODUCTION 

Stellar activity induces brightness and line-shape variations 
which can mimic planetary signals and hinder the detec- 
tion and/or characterisation of the latter. In particular, in 
radial velocity (RV) surveys, many stars display intrinsic 
variability which is attributed to activity, and which occurs 
on timescales similar to the planetary signals of interest. 
There is thus significant interest in characterising the level 
of activity-induced RV variability in stars targeted by planet 
surveys, and in developing tools to distinguish between the 
latter and planetary signals. 

The simplest, widely used method to deal with activity- 
induced variability in RV surveys is to add a 'jitter' term 
in quadrature to the RV uncertainties before searching for 



planetary signals. Wright ( 2005 1 proposed an empirical re- 



lation to predict the magnitude of this jitter term from a 
star's activity level, B — V colour and absolute magnitude, 
calibrated on 450 targets from the California and Carnegie 
planet search, but this relation is far from tight. More re- 



cently, Isaacson & Fischer (2011 1 determined chromospheric 



activity levels for over 2500 target stars from the same sur- 
vey, and compared them to the RMS scatter of their RV 
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the effect of activity at the few m s ^ level ( Melo et al.|2007 
Boisse et al. 2009 1. However, this correlation is highly depen- 



dent on the spot distribution and stellar inclination (Boisse 
et al.|20TT l and is not always present. In Pont et al. (2011 1, 



we developed another method, which consists in modeling 
the variations of the stellar brightness and the CCF bisec- 
tor span using a spot model, to predict the activity-induced 
variations. Spot models are very degenerate: many spot dis- 
tributions with widely different numbers of spots and spot 
parameters fit the data equally well, so the predictions of 
many models must be averaged. One way to address this 
degeneracy is to use maximum entropy regularisation, as 



was done by Lanza et al. (20071, to model the Sun's total 



irradiance variations. Recently, Lanza et al. (20111 applied 



this method to the planet-host star HD 189733, using pho- 
tometric observations performed by the MOST satellite to 
predict the activity signal in simultaneous SOPHIE RV ob- 
servations, and obtained slightly better results than |Boisse| 
et al. (20091, who had previously analysed the same dataset 



using the bisector span de-correlation method. 

The data from space-based transit surveys such as 
CoRoT and Kepler contain a wealth of information about 
stellar activity, but the spot-modeling approaches above are 
computationally intensive to be applied systematically to 
many light curves. They are also dependent on fairly de- 
tailed a-priori knowledge of the star being modeled, and in 
particular of its rotation period. In this paper, we present a 
new, much simpler method to predict the activity-induced 
RV variations for a given star from its photometric varia- 
tions, which can be applied to stars whose rotation period 
is not known. This is intended primarily for statistical pur- 
poses: to characterise the overall RV variability properties 
of a sample of stars, and to select the best targets for RV 
follow-up, for example. However, as Kepler continues to ob- 
serve the candidate planets it has identified while they are 
being followed up from the ground, our method may also 
prove useful in reducing contamination of the RV data by 
activity signals in individual cases. 

Our method is based on a very simple spot model, 
which we introduce in Section (2] We then make use of a 
relationship between the photometric and RV signatures of 
individual spots to formulate a means of simulating the RV 
variations based on the light curve only, as outlined in Sec- 
tion [S] In Section [4] we test the method on simulated data 
and give three example applications: the SOPHIE/MOST 
observations of HD 189733b already used as a test case by 



Boisse et al. (20091 and Lanza et al. (20111, total irradiance 



and RV variations of the Sun synthesized by |Meunier et alT] 
( |2010b[ ), and Kepler quarter 1 light curves for a subset of 
the planetary candidates published by |Borucki et al. (2011 1. 
We present our conclusions and future plans for applying 
the method to other datasets, in Section [5] 



2 A SIMPLE SPOT MODEL 



Dorren ( 1987| provided analytic expressions to model the 
photometric signature of a circular spot on a rotating stel- 
lar surface, accounting for limb-darkening using a single- 
coefficient, linear limb-darkening law. It is possible to ex- 
pand these to also model the RV signature of such a spot, 
and to account for both dark spots and bright faculae by 



allowing the contrasts and limb- darkening coefficients to 
change sign. However, the resulting algebra is relatively cum- 
bersome. Here we seek a simpler model, whose mathematical 
expressions are simple enough to afford a more direct insight 
into the dependency of the photometric and RV signatures 
of active regions on the various parameters, while retaining 
as much realism as possible. In order to achieve this, we 
make a number of simplifications, the impact of which we 
will test by comparing our results to data simulated using 
the more complete Dorren ( 1987 1 formalism. 



2.1 Photometric signature of a point-like spot 

To model the photometric signature of dark spots, we as- 
sume that the spots are small, i.e. that a <C 1, where a is the 
spot's angular radius on the surface of the star. This allows 
us to ignore projection effects within a spot, and obviates 
the need to assume a particular shape for the spots. It also 
enables us, when considering multiple spots, to assume that 
they never overlap. Additionally, we ignore limb-darkening. 
This alters the photometric signature of dark spots only 
slightly, since both the spot and the un-spotted photosphere 
are darker towards the limb than they are near the centre 
of the stellar disk. 

Under these conditions, the relative drop in flux due to 
a single point-like, dark spot rotating on the stellar surface 
is 



F{t) = f MAX{cos/3(t);0}, 



(1) 



where / — 2(1 — c)(l — cos a) represents the relative flux 
drop for a spot at the disk centre, c is the contrast ratio 
between the spot and the un-spotted photosphere, and I3{t) 
is the angle between the spot normal and the line of sight. 
This angle is given by 



cos I3{t) — cos (l}{t) cos S sin i + sin 5 cos i , 



(2) 



where i is the stellar inclination (the angle between the star's 
rotation axis and the line of sight), 5 is the latitude of the 
spot relative to the star's rotational equator, and (/!>(t) is the 
phase of the spot relative to the line of sight (see Appendix [A| 
for details). The latter is of course (j){t) = 27rt/Prot + (l>o, 
where Prot is the star's rotation period and (j>o is the lon- 
gitude of the spot (i.e. we take the stellar meridian to be 
aligned with the line of sight at t = 0). The observed stellar 
fiux is then simply 



*(t) = *o [1 - Fit)] , 



(3) 



where ^0 is the fiux in the absence of spots. 

By allowing c to exceed one, one could use the equations 
above to model the signature of bright spots such as facu- 
lae. However, faculae on the Sun are limb-brightened (see 
e.g. [Lanza et al.||200"7| [Meunier et al.|[2010a| and references 
therein), and any model that ignores the limb-angle depen- 
dence of the contrast will not reproduce their photometric 
signature well. Thus we have opted not to include faculae 
in our model. Fortunately, the latter typically have low con- 
trast, and hence the missing photometric effect tends to be 
small. Comparative studies of Sun-like stars by |Radick et al.| 
(1998| and |Lockwood et al.| ( |2007t also suggest that faculae 
are less important in stars more active than the Sun. 
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2.2 RV signature 

The most important effect of the spot in RV is due to the 
fact that it suppresses the flux emitted by a portion of the 
rotating steUar disk, thus introducing a perturbation to the 
disk-averaged RV. Provided c is not close to one, this pertur- 
bation can be estimated simply by multiplying the projected 
area of the spot, which is given by F{t), by the RV of the 
stellar surface at the location of the spot: 



ARVrot{t) = -F{t) Voq cos(5sin0(t) sini. 



(4) 



where 14q — 27ri?*/Prot is the equatorial rotational velocity 
of the star, and _R* the stellar radius. Differential rotation 
can be included in this formalism by allowing Vcq (or equiv- 
alently Prot) to vary as a function of the spot latitude. 

Spots tend to be associated with magnetized areas 
which, while they have very limited photometric contrast, 
are much more extended spatially. These do have an im- 
portant impact in RV, because convection is partially sup- 
pressed within them, leading to a reduction in the convec- 
tive blue-shift (see Meunier et al. 2010a|b and references 
therein). Within our simplified formalism, it is possible to 
approximate the resulting RV perturbation as 



ARVc{t) = +F{t) SVc K cos I3{t) 



(5) 



where 5Vc is the difference between the convective blue-shift 
in the unspotted photosphere and that within the magne- 
tized area, and k is the ratio of this area to the spot surface 
(typically ^ 1). The total RV signature of the spot and 
associated magnetised area is then simply 

ARV{t) = ARVrot{t) + ARV,{t). (6) 



2.3 Examples 

Figure [T] shows light and RV curves simulated using this 
simple model for an equatorial spot (solid black line) and 
a high-latitude spot on an inclined star (solid cyan line). 
The dotted grey line in the bottom panel shows the equato- 
rial spot case without the convective blue-shift suppression 
term. Also shown for comparison is the same equatorial spot 
modeled with the more sophisticated formalism of |Dorren] 
( |1987[ ), who gives analytical expressions for a circular spot 
of finite size on a limb-darkened photosphere. Figure[2]shows 
the corresponding amplitude spectraQ which we use to eval- 
uate the impact of the simplifications we have made on the 
frequency content of the simulated light and RV curves. 

In all cases, the amplitude spectrum of the light curve 
is dominated by the rotational frequency, as one might 
expect. There is also signal at the second, fourth, sixth 
and higher even- numbered harmonics v — 2n/Prot, where 
n = 1,2,3,..., although the amplitude decreases rapidly 
with n. On the other hand, there is essentially no sig- 
nal at odd-numbered harmonics v = (2m -I- l)/Prot, where 
m = 1,2, 3, 



As previously noted by Boisse et al. (2009 



^ Throughout this paper, we use amplitude spectra computed 
by linear least-squares fitting of a sinusoid with free zero-point, 
amplitude and phase at each frequency. This is akin to the gen- 
eralised periodogram of Zcchmcist er &: Kiirster] | |2009[ | but ex- 
pressed in units of amplitude rather than x'' reduction. 
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Figure 1. Simulated photometric and RV signatures of a single 
spot (top and bottom respectively). The solid black line shows 
the output of our simple model for a fairly large, dark, equatorial 
spot (c = 0, a = 10 °, 5 = 0) on a star with Prot = 5 days, 
j = 90°, 5Vc = 200ms-i and k = 10 (see text for details). The 
solid cyan line is the same, but for a higher latitude spot on an 
inclined star (5 = 60°, i = 70°). For comparison, the dashed 
lines show the same spots modeled with the formalism of |Dorren] 
l |1987| l (stellar linear limb-darkening parameter = 0.5). For 
simplicity, we have omitted the Dorren formalism for the high- 
latitude case in the bottom panel. Instead, the black dash-dot 
line shows the equatorial spot simulated with our simple model, 
but without the convective blue-shift effect (SVc = 0). 
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Figure 2. Amplitude spectra for the light and RV curves shown 
in[T] using the same colour-coding. These spectra were computed 
from time-series lasting 5 Prot- Frequencies are expressed in units 
of inverse rotation periods. 



2011 1, the RV signature is dominated by the first three har- 



monics of the rotational frequency, with additional signal 
at odd-numbered harmonics, although at much lower am- 
plitude. The differences between the distribution of power 
at higher harmonics in photometry and in RV arise because 
the photometric signal is maximised when the spot is face 
on, which occurs once per disk-crossing, but the RV signal 
is maximised when the spot longitude is 45°, which occurs 
twice per disk crossing. 
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Changing the spot latitude and steUar inchnation can 
substantially alter the light and RV curves, as it changes 
the fraction of the rotation cycle over which the spot remains 
in- or out-of-view, and the rotational velocity of the occulted 
parts of the stellar disk. This results in significant changes in 
the relative amounts of power at the different harmonics of 
the rotational frequency, in some cases entirely suppressing 
all but the fundamental, or on the contrary giving rise to 
relatively large amplitudes at high-n harmonics. Projection 
effects over the area of the spot, and limb-darkening (both 
of which are accounted for in the |Dorren||l987| formalism, 
but not in our simple model), alter the shape and maximum 
amplitude of the light curve, and hence the balance of power 
between the fundamental and second harmonic, but not in 
a very substantial way (except for extremely large spots). 
The effect on the RV signature is even smaller. 

The convective blue-shift causes the RV perturbation to 
be biased upwards and to depart from an exact sinusoidal 
shape. As ARVc(t) is proportional to F^, the power of the 
convective component of the RV signature is concentrated 
at the rotational frequency and its first harmonic. Except 
for extreme cases, the effect of convective blue-shift on the 
frequency content of the RV curve remains minor. 



3 THE FF' METHOD 

3.1 Relationship between photometric and RV 
signatures 

Considering the equations of our spot model, as presented 
in Section |2] it is interesting to note that 

F{t) — —f sin 0(f) <f>(t) cos (5 sini 

27r 



-/ sin (f>{t) cos 5 sini 



(7) 



Therefore the expression for the RV signature of spots can 
be re-written: 

ARVrotit) = -~F{t) F{t) R,/f. (8) 
This can be estimated directly from the light curve, as 



Fit) = 1 
hence 



*0 



ARVrotit) = 



*0 



and 



*0 



Fit) 



*0 ' 



R. 
f 



Similarly, the convective blue-shift effect is given by 

ARVcit) = +F\t)5Vc>i/f, 

which can be estimated from the light curve as 



ARVcit) = + 



1 - 



*0 



SVcK 



(9) 



(10) 



(11) 



(12) 



The above expressions provide a means to simulate 
activity-induced RV variations based on a well-sampled light 
curve, without knowing the rotation period. As this method 
uses the light curve and its own derivative, we refer to it as 
the FF' method. 

The effect of multiple spots simultaneously present on 
the stellar surface is additive: the observed fiux modula- 
tion is the sum of the contributions from individual spots, 



and similarly for RV. However, strictly speaking, the reason- 
ing behind the FF' method cannot be extended to multiple 
spots, since the direct relationship between the RV signature 
of each spot and the light curve breaks down. Therefore, we 
expect the FF' method to perform best when a single active 
region dominates the visible hemisphere. When multiple ac- 
tive regions are present, we are essentially making a first or- 
der approximation: therefore the FF' should still reproduce 
the dominant features of the RV variations, but will neces- 
sarily be less accurate, particularly on short timescales. We 
shall test the extent to which this is the case using both 
observed and simulated data. 



3.2 Practical implementation 

To compute the RV signal from the light curve, one must 
estimate i?*, f and SVci^- We will assume that is 
known at least approximately, which is generally the case. 
If the distribution of active regions is relatively smooth, so 
that there are always several active regions on both of the 
visible and the hidden hemisphere, then the maximum of the 
observed light curve, $max, will be offset from ^'o. In that 
case, one may expect a direct scaling between this offset 
and the amount of variability in the light curve scatter. We 
adopt the expression 



*o ~ ^max + ka. 



(13) 



where a is the light curve scatter and is a free parame- 
ter. We use the scatter rather than, say, the peak-to-peak 
amplitude, because - except for nearly pole-on stars, whose 
variability will in any case be minimal - active regions which 
are always visible are necessarily relatively near the limb, 
whereas the lowest excursions in the light curves are pre- 
sumably caused by active regions which come close to the 
centre of the visible disk. Initial tests regarding the k on 
a number of simulated test cases and on the observations 
of HD 189733, which are described in Section [4.3] suggested 
that fe = 1 is a suitable value for relatively active stars, and 
this is the value we use by default. However, when enough 
data is available to constrain a free parameter, it is advis- 
able to fit for either k or 'I'o itself. Once ^to is determined, 
the largest observed departure from it provides a fairly good 
estimate of the spot coverage: 



^0 - <iu 
*0 



(14) 



where "l?min is the minimum observed flux. Again, this scal- 
ing relation performed well in initial tests on simulated data 
and observations of HD 189733b. Even in data-rich situa- 
tions, it is rarely helpful to fit for both / and $o as the two 
parameters are mutually degenerate. 

For well-sampled light curves, the derivative of the flux 
can be estimated directly from the difference between con- 
secutive data points. This procedure is highly sensitive to 
high-frequency noise, so the light curve must be smoothed 
flrst. One possibility is to do this using the non-linear fllter 
of Aigrain & Irwin ( 2004| with a smoothing length approxi- 
mating a tenth of the rotation period (or, when the latter is 
not known, of the dominant periodicity in the light curve). 
For an example application using this filter, see Section [4. 3| 
When the time-sampling is not sufficient, an alternative is 
to model the light curve using Gaussian process regression 



Activity-induced RV variations from photometry 5 




-0.2 0.0 

Phase 



Figure 3. Photometric and RV signatures of spot distributions 
matching low-order multipoles with 1, 2, 3 and 4 nodes along the 
equator (solid black, dashed cyan, dash-dot magenta and dot- 
ted grey line respectively). The spot distributions are uniform in 
sin(5). The stellar parameters are identical to the fiducial values 
used in Figure [T] 



(see Section 4.4 for an application, and Appendixplfor more 



details on Gaussian process regression). 



4 TESTS AND APPLICATIONS 

4.1 Photometry is insensitive to certain spot 
distributions 

The FF' method rests on the assumption that the photo- 
metric signal contains sufficient information to adequately 
predict the RV signal. We test this assumption here by ex- 
amining the relative amplitudes and shapes of photometric 
and RV signals from spot distributions approximating low- 
order multipoles in the longitudinal direction. We simulated 
light and RV curves for stars with 200 spots, each with c — 
and a — 0.01°. The spot longitudes (jjo were drawn at ran- 
dom from the desired distribution, and their latitudes were 
taken from a uniform distribution in sin((5). All the parame- 
ters of each spot were fixed (no spot evolution) and all spots 
shared the same rotation rate (no differential rotation). 

The results of these tests are shown in Figure [3] While 
both photometric and RV signals are sensitive to dipole 
(black line) and quadrupole (dashed cyan line) configura- 
tions, the photometric signal is only very marginally sensi- 
tive to higher order odd-numbered multipoles (e.g. order 3, 
dash-dot magenta line), which do give rise to some RV vari- 
ation. This can be understood intuitively as a consequence 
of the symmetry of the problem. The same phenomenon has 



aheady been noted by Cowan & Agol ( 2008 1 in the context 



of phase-function mapping of exoplanets: sinusoidal maps 
with odd order have no photometric phase function signa- 
ture. Conversely, the RVs are not very sensitive to higher or- 
der even-numbered multipoles (e.g. order 4, dotted grey line) 
configurations, although this is less noticeable. This implies 
that any attempt to simulate activity-induced RV variations 
based on photometry - whatever the method - is likely to 
over-estimate the signal at high-n (even-numbered) harmon- 



ics, and underestimate that at high-m (odd-numbered) har- 



4.2 Simulations using multiple, evolving active 
regions 

We used the spot model described in Section [2] to generate 
flux, RV and bisector span time series and test the ability of 
the FF' method to recover the RV signal based on the flux 
information only. 

We generated 100 time-series lasting 100 days each, each 
with 20 randomly distributed, evolving, differentially rotat- 
ing spots, and another set of 100 time-series with 200 spots. 
We computed photometric and RV time-series lasting 100 
days, with one point every 0.05 day. For each realisation, the 
stellar inclination was drawn at random from a distribution 
uniform in cos(i), and the spots were distributed uniformly 
in sin((S), and rotated differentially depending on their lati- 
tude, following P{S) = Po+0.1 sin((5), with Pq — 5 days in all 
cases. The angular size a of each spot followed a squared ex- 
ponential growth and decay, with e-folding times drawn from 
a log-normal distribution with mean 0.4 log Pq and standard 
deviation 0.15. The peak times were drawn from a uniform 
distribution over the range [— L/2,3L/2], where L was the 
light curve duration. 

The purpose of these simulations is to identify funda- 
mental limitations of the method, rather than to evaluate 
its performance in fully realistic conditions. Therefore, we 
did not add noise to the simulated data. Each photometric 
time-series was processed with the FF' method, as outlined 
in Section [3.2| to generate synthetic RVs. As no noise was 
added, no smoothing was necessary. The resulting RVs are 
compared to the output of the spot model for a subset of 
the simulations in Figure [4] There is generally very good 
agreement between the time series, but the FF' RVs occa- 
sionally depart from the output of the spot model by up to 
a third of the peak-to-peak amplitude. In those cases, the 
periodograms (which otherwise show excellent agreement) 
show a different ratio between the odd- and even-numbered 
harmonics. This occurs when the spot distributions approxi- 
mate odd-numbered multipoles: as noted in the previous sec- 
tion, this causes RV variations with virtually no counterpart 
in photometry. In other words, as expected, RV variations 
at the fundamental and first harmonic are well reproduced 
by the FF' output and strongly suppressed in the residuals 
(often by as much as 80%), but variations at higher, even- 
numbered harmonics remain, and can even be enhanced. 

We also computed and compared the overall RMS of 
the 'true' (spot-model) RVs, the FF'output, and the resid- 
uals, and the best fit sinusoidal period and amplitude in 
the FF' output compared to the true RVs. The RMS of 
the residuals is reduced by > 50% compared to the origi- 
nal value in 54% of the simulations with 20 spots, but this 
performance is achieved in only 16% of the simulations with 
200 spots. This is because, when there are so many spots, 
the spot distribution almost always has an odd-numbered 
multipole component. Nonetheless, as shown in Figure |5] 
the RMS, amplitude and period of FF' output are always 
well-correlated with the corresponding values for the 'true' 
RVs. For simulations using 20 spots, the Pearson correlation 
coefficients for the RMS and amplitudes are 0.90 and 0.84, 
respectively, with slightly lower values of 0.80 and 0.75 for 
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time (days) time (days) Frequency (cycles/p,.;) 

Figure 4. Simulation examples: flux variations simulated with the spot model (left), RV variations (middle) produced by the spot model 
(blue) and by the FF' method applied to the flux data shown in the left panel (black), and corresponding periodograms (right). These 
10 examples were selected at random from the 100 simulations with 20 spots. 




Figure 5. Comparison of RV variations simulated with our spot model (x-axis) to the predictions of the FF' method applied to the 
photometric output of the same simulations (y-axis). The left, middle and right column show the time-series RMS scatter, the period 
and the amplitude of the best-fitting sinusoidal modulation, while the top and bottom rows correspond to 20 and 200 spots, respectively. 
The black line in each panel follows y = x. 
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Figure 6. Photom etry and RV time- series for HD 189733. The 
MOST light curve i jBoisse et al.|2009| l is shown in the top panel 
and the observed and simulated RV data are compared in the bot- 
tom panel. The black dots with error bars show the SOPHIE data 
from [Lanza et al.| | |2011[ | after removal of the best-fit planetary 
signal, and subtraction of a constant 21.6 ms— ^ offset. The grey 
line shows the RV curve simulated by applying the FF' method 
to the MOST light curve, and the grey dots show the same curve 
linearly interpolated to the sampling of the SOPHIE observations. 



200 spots. There is a slight tendency for the FF' to over- 
predict the ampUtude of the variations: linear fits to the 
scatter plots shown in Figure [5] yield slopes of 1.08 and 1.06 
(1.1 and 1.17) for the RMS and amplitude respectively, for 
20 (200) spots. The period, or its first harmonic, is almost 
always correctly recovered (up to a precision of 10%). Inci- 
dentally, the fraction of the cases where the first harmonic 
dominates over the fundamental is smaller for the FF' out- 
put than for the direct spot-model simulations. 

Thus, except in particularly favourable cases, the FF' 
method does not enable a precise 'correction' of the RV 
variations due to activity, prior to searching for low mass 
planets, for example. However, it does permit a statistical 
comparison in terms of amplitudes and frequency content. 
Nonetheless, further tests on real data are desirable to estab- 
lish the performance of the FF' method on a firmer footing. 

4.3 Application to HD 189733 

The transiting planet host star HD 189733 was the target of 
intensive simultaneous monitoring with the RV spectrograph 
SOPHIE and the photometric satellite MOST. An in-depth 
analysis of these observations from the activity point-of-view 



was already presented in Boisse et al. (20091. This dataset 



constitutes a useful test for RV jitter simulation methods 
based on photometry, and was recently used for this specific 



Starting from the MOST light curve, which is shown 
in the top panel of Figure |6] we simulated the expected 
activity-induced RV variations using the FF' method. The 
light curve was first smoothed using the iterative non-linear 



filter of Aigrain & Irwin ( 2004 1 using a baseline of 6 data 
points (~ 10 h) to reduce the noise on the time-derivative es- 
timate. The results are shown as the grey line in the bottom 
panel of Figure |6] We then compared this to the SOPHIE 
observations, which are shown as black dots. Note that we 
used the new reduction of the SOPHIE data, as described 



by Lanza et al. ( 2011 1, and worked with the residuals of the 



planetary or bit (I. Boisse, priv. comm. 



et al. 



Following Lanza 
r 



(20111, we subtracted a constant offset of 21.6ms 
from the SOPHIE orbit residuals. We then linearly inter- 
polated the FF' output to the sampling of the SOPHIE 
observations (grey dots in Figure [6]|. 

The interpolated FF' output is a good match to the 
orbit residuals except from HJD = 2 454308 to 2 454 310, 



and is virtually identical to the Lanza et al. (20111 re- 
sults throughout. The latter already noted that their model 
could not reproduce the very rapid drop observed in the 
RVs around this time. One possible explanation may be 
that the spot distribution around this time had a significant 
odd-numbered multipole component, which no photometry- 
based method could recover. However, we note that, around 
HJD = 2 454 308, the observed flux also dips faster than can 
be reproduced by an unevolving surface feature rotating into 
view. This suggests that there are rapidly evolving active re- 
gions on the star at this time, a situation which neither the 
FF' method nor the method of Lanza et al. (2011 1 are well 
suited for. 

The reduced of the SOPHIE orbit residuals (ex- 
cluding the problematic interval from HJD — 2 454 308 to 
2 454 310) is 14.45, and their r.m.s is 9.4 ms~^. Subtracting 
the activity contribution, as predicted by the FF' method, 
reduces the reduced by a factor > 2 to 6.58, and the r.m.s 
to 6.6 m s~^ . Although the presence of a residual activity sig- 
nal cannot be excluded, the final r.m.s is consistent with the 
level of instrumental systematics typical of SOPHIE at the 
time of the HD 189733 observations (~ 5ms~^, Boisse priv. 
comm.). 

We also computed the Lomb-Scargle periodogram of the 
RV data before and after subtracting the simulated activity 
signal (Figure [7|. For this calculation, we used only data si- 
multaneous with the MOST observations, and excluded the 
aforementioned discrepant data points. Again the results are 



almost identical to Figure 6 of Lanza et al. ( 2011 1. Subtract- 



ing the output of the FF' method suppresses power at the 
rotational frequency by a factor close to 10, and gradually 
decreasing fractions of the power at each harmonic. The last 
significant peak (the third harmonic) is suppressed by a fac- 
tor of ~ 2 only. 

In summary, despite its simplicity, the FF' method 
gives results that, at least in this specific case, are equivalent 
to the more sophisticated approach of [Lanza et al.| ( [2011[ ), 
and achieves the same performance in terms of RV power 
suppression. One possible explanation for this is that the 
maximum entropy regularisation employed by [Lanza et al.[ 
(20111 effectively places a prior on the spatial and tempo- 



purpose by Lanza et al. (2011 1 



ral scales accessible to their model. This has a similar effect 
to the approximations made in the FF' method, which is 
only a first order approximation to the full expression for 
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Figure 7. Lomb-Scargle periodograms of the observed RV time- 
series for HD 189733 after subtracting only the planetary orbit 
(solid black line) and after removing the simulated activity signal 
also (dashed grey line). Both RV datasets have the same time 
sampling, including only SOPHIE data points simultaneous v^ith 
the MOST observations, and excluding the 4 data points between 
the vertical dashed lines in Figure[6] The rotational frequency and 
its harmonics are indicated by the vertical dotted lines. FoUoviring 
[Boisse et al.| l [2009[ l, we took the rotation period to be 11.953 days. 



multiple spots, and is therefore expected to match the sig- 
nature of the dominant active regions only. Therefore, one 
should be cautious not to over-interpret the output of both 
models, particularly as regards short-term behaviour - un- 
fortunately, a detailed comparison of the two methods in the 
short-timescale regime is difficult because the RV data for 
HD 189733 have relatively sparse time-sampling, and uncer- 
tainties comparable to the short-term activity signal. 

4.4 The Sun 

The Sun, as the nearest and best-studied star available to us, 
is an ideal test case for the FF' method. Its brightness and 
RV variations have been intensively monitored for decades, 
from space (e.g. by the SoHO satellite) and from the ground 
(e.g. by the GONG and BiSON networks). However, solar 
RV monitoring projects are primarily intended to study the 
Sun's 5-minute oscillations, and we were not able to find 
a publicly available set of full-disk RV measurements that 
were calibrated over long timescales. 

We therefore used the synthetic dataset of |Meunier et al.| 
( |2010b[ ). This dataset was generated by identifying different 
types of magnetic features on SoHO/MDI magnetograms, 
and simulating their effect in both photometry and RV, tun- 
ing the model to fit the total solar irradiance (TSI) varia- 
tions observed by SoHO/VIRGO. We focus on two 6-month 
long sections at opposite phases in the solar cycle, one dur- 
ing a period of relatively low activity, and one during a pe- 
riod of relatively high activity. For each, we were provided 
with synthetic TSI and RVs sampled approximately once 
per day (A.-M. Lagrange, priv. comm.). The synthetic RVs 
are, of course, subject to the limitations of the |Meunier et ah] 
(2010b I model. However, applying the FF' to the synthetic 



fairly stringent test of the very limited treatment of faculae 
and plage used in the FF' method, since the latter are the 
dominant effect in the synthetic RVs. 

As can be seen in the top panel of Figure |8j the syn- 
thetic TSI (black dots) is a good match to the observed TSI 
(grey line), but it is not smooth. Its time-sampling is also 
slightly irregular and relatively sparse (< 1 point per day). 
This makes smoothing using the iterative non-linear filter 
discussed in Section [3. 2| which gave satisfactory results for 
the MOST time-series of HD 189733 in Section [^S] inappro- 
priate for this dataset. Instead, we performed a Gaussian 
process (GP) regression on the synthetic TSI. More details 
on the GP regression are given in Appendix [B] 

The resulting smoothed TSI (shown as the black solid 
line in the top panel of Figure [8| was then fed in to the FF' 
method, and the results are compared to the synthetic RVs 



in the bottom panel. We first used Equation ( 13 1 to estimate 



'i'o, and set kSVc to zero (black line). We also tried fitting 
for those two parameters, using a downhill simplex optimizer 
to minimize the sum of the squared differences between the 
synthetic RVs and the FF' output linearly interpolated to 
the same time-sampling (thick red dashed line). Finally, we 
also tried setting ^'o to 1, corresponding to an absolute irra- 
diance of 1367.62 W/m^, which is approximately the maxi- 
mum irradiance observed at any time during the last solar 
activity cycle (green line, kVc was again set to zero). In the 
low-activity case, the red and black lines are identical, with 
best-fit parameters 'I'o = 0.99887 and kSVc = 12.7 ms~\ 
The three versions have very similar r.m.s. residuals, this 
time ~ 0.06 ms~^. In the high-activity case, it is the green 
and black lines (and the corresponding values of ^o) which 
are virtually indistinguishable, whereas the best-fit param- 
eters used to produce the red curve are 'I'o = 0.99926 and 
kSVc — 417 ms~^. Again, the three versions have very simi- 
lar root mean square (r.m.s.) residuals, ~ 0.5 ms~^. In both 
the low- and high-activity cases, fixing ^'o to 1 appears to 
give a better match visually than fitting for it, but it yields 
a slightly larger squared error and residual r.m.s. 

The most significant deviations between the synthetic 
and FF' RVs occur when a large sunspot group causes a 
sudden drop in the TSI, e.g. near HJD - 2450413. This nec- 
essarily induces a similarly large variation in the FF' out- 
put, which has no counterpart in the synthetic RVs of |Me-| 



unier et al. (2010b I. There are several possible reasons for 



TSI, and comparing the results to the synthetic RVs at least 
provides an internal consistency check. In particular, it is a 



these discrepancies: the insensitivity of the photometry to 
certain spot configurations, the limited treatment of faculae 
and plage in the FF' method, or unidentified issues with the 
synthetic RVs themselves. In the absence of measured solar 
RVs to provide a ground truth, all these possibilities must 
remain open. 

Nonetheless, the overall agreement between the FF' 
output and the synthetic RVs is very encouraging. This im- 
plies that the FF' method can certainly be used to study 
RV variations down to the ms~^ level. 



4.5 Application to Kepler data 

Given a sample of high-precision, high time-sampling light 
curves, the FF' method can be used to estimate the statis- 
tics of the activity-induced RV variability of the same sample 
of stars. The tens of thousands of high-precision light curves 
produced by the Kepler mission constitute an ideal dataset 
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Figure 8. Application of the FF' to synthetic solar data. The black dots show the synthetic TSI (top panel) and RV (bottom panel) 
variations of the Sun, from |Meunier et al.| l |2010b[ l, for two 6-month periods when the Sun was relatvely inactive (left) and active (right). 
In each case, the solid black line in the top panel shows the smoothed version of the synthetic TSI used as input to the FF' . The 
measured TSI (SoHO/VIRGO daily average, from http://www.pmodwrc.ch/, maintained by C. Frohlich) is also shown for comparison 
as the solid grey line. The solid black, thick dashed red and solid green lines in the bottom panel then show the FF' predictions based 
on that smoothed TSI, using different values of a-nd kSVc (see text for details). 
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Figure 9. Example Kepler Quarter 1 light curve before and after 
applying our systematics correction. The original, raw time-series 
is shown in grey, the corrected time-series in black, and the cor- 
rected time-series without the transits (used to estimate the RV 
modulations) in red. Note that the red line completely overlaps 
with the black, except during the transits. This example is KID 
3642741 (KOI 242). 



to do this. A systematic application of the FF' method to 
individual Kepler hght curves is beyond the scope of this 
paper, but as an illustration we now proceed to apply it to a 
subset of the hght curves in which the Kepler team identified 
transiting planet candidates ( Borucki et al.|20TT i. 

The Kepler photometric pipeline produces two versions 
of the hght curves: a 'raw' time-series, and a version cor- 
rected for most of the systematic instrumental effects. In 
the current version of the pipeline, this correction unfortu- 
nately also removes much of the intrinsic variability of the 
target stars. In the context of a separate study, focussed on 
the statistics of photometric variability in Kepler data, we 



Figure 10. Application of the FF' to Kepler quarter 1 light 
curves containing planet candidates. The small blue dots show 
the RV amplitudes derived from the light curves after removing 
the transits and smoothing on one tenth of the dominant light 
curve period, while the small red dots show the RV amplitude 
expected for a white noise-only light curve with the same high- 
frequency noise level, smoothed to the same extent. The black 
dots show the noise-corrected RV amplitude estimates, obtaining 
by subtracting the latter from the former in quadrature. 



have developed a more conservative systematics removal cor- 
rection, which is designed to preserve astrophysical signals. 
This correction will be described in detail in a forthcoming 
paper, so we only summarise the underlying principles here. 
Each Hght curve is decomposed into a linear combination of 
all the other light curves, plus an intrinsic component, us- 
ing Bayesian linear regression. The most significant trends 
that are common to many hght curves are identified using 
an information entropy criterion. They are then combined 
using principal component analysis, de-composed into their 
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intrinsic oscillatory modes, and removed from each individ- 
ual light curve, again using Bayesian linear regression. The 
process is repeated iteratively until no further trends are 
identified. The correction is currently available only for the 
data from Quarter 1, which were released to the public on 
June 15, 2010. We therefore focus on 601 of the 997 stars 



with planet candidates announced by Borucki et al. (2011 1, 



whose light curves were included in that release, and treat 
only the Quarter 1 data, which lasts 33 days. An example 
of the systematics correction applied to one of these objects 
is shown in Figure [9] 

After masking the transits, we first computed the peri- 
odogram of each light curve to identify the dominant peri- 
odicity, restricting the search to the range 0.5-50 days. The 
periods were not checked individually, and there is no guar- 
antee that they are related to a real, physical period (such 
as the rotation period of the target star). However, we use 
them to smooth the light curves, by applying the iterative 
non-linear filter of [Aigrain fc Irwin| ( |2004[ ), with a smooth- 
ing time-scale equal to one tenth of the identified period. We 
then apply the FF' method to the smoothed light curve. The 
results are, of course, affected not only by the intrinsic vari- 
ability of the host star, but also by any photometric noise 
still present in the smoothed light curve. The latter depends 
not only on the stars' magnitude, but also on the smooth- 
ing timescale used, which varies from star to star. This is 
unavoidable - excessive smoothing of a light curve with 
real, short-term variability would lead to under-estimated 
RV variations - but it is important to quantify the residual 
noise contribution to the RVs. In each case, we therefore 
also computed the RV variations expected for a simulated 
light curve containing only white Gaussian noise at the same 
level as the original, smoothed using the same timescale. The 
noise level was estimated as the standard deviation of the 
original light curve minus the smoothed version thereof. The 
noise-only RV amplitude was subtracted from the amplitude 
derived from the real light curve, yielding an estimate cor- 
rected for photometric noise. 

Figure [To] shows the resulting RV peak-to-peak ampli- 
tudes over the duration of Quarter 1, as a function of Kepler 
magnitude. The small blue dots show the amplitude mea- 
sured for the real light curves, and the small red dots the 
corresponding amplitudes for noise-only light curves, and 
the black dots show the noise-corrected estimates. The dis- 
tribution of the red points shows that the Kepler light curves 
can be used to estimate RV variations down to the 2-3 m s~^ 
level, below which they become noise-dominated in most 
cases. The distribution of the noise-corrected amplitudes is 
approximately log-normal, with a mean of ~ 4.1 ms~^ and 
a width of ~ 0.45 dex. Approximately two thirds of the can- 
didates are expected to display RV variations significantly 
above the noise limit. This does not translate directly into 
implications for the detectability of the RV signatures of the 
transit candidates, as the intrinsic RV variations may oc- 
cur on timescales which are quite distinct from the orbital 
period. Nonetheless, it does suggest that RV confirmation 
would be challenging for these candidates, the majority of 
which have radii below that of Neptune (for comparison, the 
expected RV-semi-amplitude for a Neptune-mass planet in 
a 10 day orbit around a Sun- like star is ~ 5 ms~^). The pre- 
dicted amplitude of the activity-induced RV signal is below 
the noise level for the remaning third of the candidates. 



5 CONCLUSIONS 

We have presented a new method to derive the stellar vari- 
ability expected in RV measurements from well-sampled 
light curves, which requires no knowledge of the rotation 
period or detailed spot modelling. We call this method FF' 
because it uses the light curve and its first derivative. A 
number of approximations are built into our method: in par- 
ticular, it ignores limb-darkening, as well as the photometric 
effect of taeniae (but not their RV signature), and the ex- 
pression used to compute the RV signal is accurate only 
to first order when multiple spots are present on the stel- 
lar surface. However, the observables (photometry and RV) 
are disk-integrated quantities, which mitigates the impact 
of these approximations. 

We tested this FF' method on time-series simulated 
using the simple spot model on which it is loosely based. 
Overall, our simulations show that the RMS, period and 
amplitude of periodic modulation of FF' predictions are in 
fairly good agreement with the corresponding model values. 
Given its simplicity, our method reproduces activity-induced 
variability surprisingly well, matching the latter's amplitude 
and distribution of power at the first few harmonics of the 
rotational frequency to ~ 10% or better in most cases. There 
are exceptions, where the power at odd-numbered harmon- 
ics is not well-matched. To understand this phenomenon, 
we used our spot-model to explore the relationship between 
photometric and RV variations due to spots in the general 
sense, and showed that any method which uses photome- 
try to simulate RVs will struggle to reproduce the signals if 
spot distribution has a significant odd-numbered multipole 
component. 

We then applied the FF' method to three example 
datasets: the benchmark SOPHIE/MOST observations of 
HD 189733, synthetic TSI and RV data for the Sun, and the 
Quarter 1 light curves of Kepler planet candidate host stars. 

For HD 189733, our results are essentially identical to 
those obtained by [Lanza et al.||2011[ ) with a much more so- 
phisticated method, reducing the power of the RV variations 
at the rotational frequency by a factor ~ 10, and by a factor 
of a few at the first few harmonics. Where both approaches 
struggle to reproduce some of the observed RV variations, 
we identify a possible explanation in their common reliance 
on the photometry to predict the RVs. 

The solar tests demonstrate the performance of the 
method down to the sub-ms~^ level. We show that, given 
enough high-quality RV data, it is possible to fit for the 
un-spotted fiux level, but this has only a relatively minor 
impact, and the results obtained by estimating it from the 
light curve itself are almost as good. We also show that it 
is possible to fit for the parameter n5Vc, which controls the 
importance of the convective term, but again this has rela- 
tively little effect on the results. These tests also illustrate 
the use of Gaussian processes, rather than simpler filters, 
to interpolate and smooth the photometric time-series. This 
opens up the possibility of applying the FF' method to cases 
where there is only sparse photometry, or where a photo- 
metric proxy is derived from the same spectra as the RV 
observations themselves. 

Finally, we have shown that the FF' method can be 
used to estimate RV amplitudes down to the 2-3 ms~^ level 
using Kepler data. Approximately two thirds of the 600 
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planet candidate host stars to which wc apphcd it arc ex- 
pected to display peak-to-peak RV amplitudes over 30-day 
timescales that are above that level. This confirms that ob- 
taining mass measurements for many of the Kepler terres- 
trial and/or longer-period planet candidates will be chal- 
lenging. However, it also opens up a range of possibilities 
for identifying the most promising candidates to follow up, 
and perhaps disentangling activity- and planet-induced RV 
signals for those objects. We caution that, like all methods 
based on photometry, some care must be taken when inter- 
preting the output of the FF' method for individual objects. 
Nonetheless, we have shown that it performs as well in that 
respect as other methods which have already been put to 
that use. 

The distinguishing characteristic of the FF' method, 
however, is its simplicity, speed, and lack of free parameters. 
In its simplest implementation, all it requires is an estimate 
of the stellar radius. The method can thus be applied to large 
samples of light curves such as those produced by the CoRoT 
and Kepler missions, to assess the RV jitter properties of a 
large sample of stars, and their implications for the detection 
of habitable planets by RV. No other method we are aware 
of can be applied to a large enough sample of light curves 
to undertake such a project, which would form a natural 
extension to the present paper. 
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APPENDIX A: CALCULATIONS 
UNDERLYING THE SPOT MODEL 

Al Trajectory of the spot 

Consider a spot located at latitude 5 on the surface of a star 
with radius i?* rotating with period Prot and inclination i 
(where i is defined as the angle between the star's rotation 
axis and the line-of-sight). We define two reference frames 
S and S' , sharing the same origin, which coincides with the 
centre of the star, and the same j/-axis. The stellar rotation 
vector defines the -|-2;-direction of frame S, whilst the x-axis 
of frame S' points towards the observer. In frame S, the 
location of the spot is given by: 



1 



where <j) — 2nt/Piot + 4>{) and <f>Q is the longitude of the spot 
at t = 0. The coordinates of the spot in frame S' are then 
obtained by performing a rotation by angle — (7r/2— i) about 
the y axis: 
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A2 Relative drop in flux due to the spot 

To calculate the quantities of interest, we need to evaluate 
P, the angle between the spot normal and the line of sight. 
It is easy to show (for example using the cosine rule on 
the triangle defined by the spot's position vector and its 
projection onto the line-of-sight) that this angle is simply 
given by 

cos j3 — x' /Ri, — cos 5 cos (j){t) sin i + sin 5 cos i. 

The relative drop in observed flux F, described by equa- 
tion ([T]), follows the same time dependence. 

In frame S, the vector describing the rotational motion 
of the stellar surface at the location of the spot is given by 



— Vcq COS 5 sin 
Vcq COS 5 cos (j) 




where Kq = 2-kR^,/P, 



In frame S' this becomes 
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— Vcq cos 5 sin ( 
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\4q cos 5 cos <f> 
Kq cos 5 sin (f) cos i 

The time dependence of the RV signature of the spot, 5RVs, 
is governed by the a;-component of this vector times F{t). 

The net convective up-welling velocity is radial, so its 
line-of-sight component is simply: 

5Vc = <5VcCos/3. 



APPENDIX B: GAUSSIAN PROCESS 
REGRESSION ON THE TOTAL SOLAR 
IRRADIANCE 

In order to apply the FF' method to the solar data, we need 
to convert the irregularly sampled, discontinuous synthetic 
TSI to a smooth, tightly sampled flux estimate. We do this 
using Gaussian process (GP) regression. 

GP models are extensively used in the machine learn- 
ing community for Bayesian inference in non-parametric re- 
gression and classification problems. By definition, any vec- 
tor of observations drawn from a Gaussian process has a 
joint distribution which is a multi-variate Gaussian. Fami- 
lies of random functions which share the same smoothness 
and covariance properties can be parametrised in terms of 
a covariance function or kernel, which specifies the covari- 
ance between pairs of points as a function of - typically - 
the distance between these points in some input space (e.g. 
the time interval between two observations). Thus, GP re- 
gression is particularly well-suited to modeling time-series 
containing correlated stochastic signals and noise. A brief 
introduction to GP regression, with an application to space- 
based, high-precision stellar photometry, albeit in a differ- 
ent context (exoplanet transmission spectroscopy), can be 
found in Gibson et al. (2011 1. The textbook by Rassmussen 
fc Williamsl ( |2006| |^| provides a more detailed treatment of 
GPs for both regression and classification. 



The most important step in GP regression is the choice 
of covariance function, or kernel. Most kernels are decreasing 
functions of the distance between pairs of points in the input 
space. One of the simplest and most widely used kernels is 
the squared exponential: 



ksE(t,t') = S^exp ( — 



2r2 



(Bl) 



where fcsE(i, i') is the covariance between observations taken 
at times t and t' , and r = \t — t'\ is the time interval between 
the two observations, S is a parameter controling the ampli- 
tude of the flux variations, and t is a parameter controlling 
the time-scale of the flux variations. 

Visual examination of the synthetic solar TSI indicated 
that the data might contain variations on more than one 
time-scale. Such behaviour can be modelled using a rational 
quadratic kernel: 



kKQ{t,t') 



(B2) 



where a is an additional parameter, controling the distribu- 
tion of timescales. Indeed, it can be shown ( [Rassmussen fc| 
Williams|20 06 ) that the rational quadratic kernel is equiva- 
lent to a superposition of an infinite number of squared ex- 
ponential kernels with a distribution of time-scales r that 
is a power-law of index —a. When a — > oo, the ratio- 
nal quadratic kernel approximates the squared exponential 
kernel. For finite a, the rational quadratic implies signifi- 
cantly more covariance at relatively large separation than 
the squared exponential (equivalent to a 'long tail' be- 
haviour). 

Stellar light curves, which display the effects of rotation- 
ally modulated, evolving active regions, tend to be quasi- 
periodic. This kind of behaviour can also be modelled by a 
GP, using a periodic kernel multiplied by a squared expo- 
nential term: 



6 exp 



sin^(7rr/P) 
2r2 



r 
2^2 



(B3) 



where P is the period in days, T is the time-scale of varia- 
tions within a period, and r is now the evolution time-scale 
(also in days), controling the rate of change of the shape 
and amplitude of the signal from one period to the next. We 
could also have combined the periodic term with a rational 
quadratic term, but initial experiments with that possibility 
indicated that this gave too much freedom to the model. 

These are only some of the possible kernels which are 
relevant to the type of dataset we are trying to model here, 
see Rassmussen & Williams ( 2006 1 for a more detailed dis- 



2 Available online at www. GaussianProcess . org/gpml. 



cussion of covariance functions. For a given set of kernel 
parameters, the GP defines a probability distribution over 
functions sharing the same covariance properties. This dis- 
tribution can be conditioned on any available observations, 
yielding a predictive distribution which can be used to in- 
terpolate the data to the desired sampling. The process also 
yields a marginal likelihood, which can be maximised with 
respect to the kernel parameters (which are also known as 
the hyper-parameters of the GP) in order to optimise the 
latter. 

We experimented with all the kernels listed above on 
the synthetic solar TSI, adding a delta-function to represent 
white noise: 
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ktot{t,t') = k{t,t') + e^5{r). (B4) 

In fact, the synthetic data we are modehng contain no ex- 
plicit white noise term, but the addition of a small constant 
noise term to the diagonal elements of the covariance matrix 
significantly helps convergence. 

In the high-activity case, the squared exponential ker- 
nel yielded a poor fit to the data (low marginal likelihood 
after optimising the hyper-parameters): a single time-scale 
is not sufficient to model the data. The quasi-periodic ker- 
nel also gave a relatively low marginal likelihood, with a 
short period (unrelated to the known solar rotation period), 
T S> 1 and t <^ P (note that, unlike t, T is dimension- 
less, because it divides the sin^ term, which varies between 
and 1). This implies that the active regions are evolving 
on timescales comparable to, or shorter than, the rotation 
period. With such a combination of hypor-paramctcrs, the 
quasi periodic GP behaves essentially like the squared ex- 
ponential. The best results were obtained with the rational 
quadratic kernel, with 9 — 0.0004, a — 1.5, r = 2.9 days and 
~ 0.0001 respectively {9 and 6'„ are in units of relative 
flux and a is dimensionless) . The mean of the predictive dis- 
tribution obtained with these hyper-parameters was used as 
the smoothed TSI. 

In the low-activity case, the best results were ob- 
tained with the squared exponential kernel, with 9 = 0.001, 
r = 12.7 days and 8^ = 0.00005. When using the rational 
quadratic kernel, the best-fit value of a was very large - 
which approximates the behaviour of a squared exponen- 
tial kernel. When excluding the sunspot crossing around 
HJD = 2450413, the quasi periodic kernel gave relatively 
good results with best-fit hyper parameters 9 — 0.0008, 
P = 28.9, T = 1.4, T = 48.3 and 0w = 0.00002. However, 
this kernel could not reproduce the TSI during the aforemen- 
tioned sunspot crossing, so we used the squared exponential 
to generated the smoothed TSI. 



